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1. INTRODUCTION 

Crude oil is one of the important energy source and plays a crucial rule in the world economy. 
Diesel, gasoline, heating oil, lubricant and other forms of petrochemicals a some of the end products 
produced using crude oil and these products are indispensable for daily uses. Economic progress, 
social balance and national security can be affected significantly by crude oil prices changes [1]. 
For instance, rise in crude oil prices will surely increase the gasoline prices which in return affect the 
fundamental goods and services needed by the citizen. Just lke other commodities, supply and demand 
basically determines the fluctuation of crude oil prices. Other factors like natural disaster, political event and 
military conflict also affect the changes in crude oil prices [2]. For example, the incident of 2005 hurricane 
season has led to the closure of oil and natural gas production as well as refineries. Consequently, prices for 
petroleum-based products rose substantially as market supplies deteriorate. All the issues mentioned earlier 
illustrate the importance of crude oil prices fluctuation to us. For that reason, awareness of crude oil prices 
fluctuation is very crucial and one way of doing so 1s to utilize time series forecasting methods that are 
proposed and proven by many studies. 

Time series forecasting can assist practitioners in predicting forthcoming movement thus providing 
advantage for future planning. An example of time series forecasting method is Autoregressive Integrated 
Moving Average (ARIMA) method. This single forecasting method used by Yusof et al [3] in their study to 
predict the crude oil production in Malaysia. Hybrid forecasting methods have also been proposed by 
researchers to improve forecasting accuracy. Hybrid forecasting models are proven to be more powerful 
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compared to single models because the combination focuses on the strengths of these single models while 
eliminating their original limitations [4]. Recently, a novel “decomposition-and-ensemble” framework for 
hybrid forecasting has proven its efficacy in time series forecasting [5], which breaks down an intricate time 
Series into simpler components, forecasts them individually and ultimately ensembles them into final result. 
One of the examples is the crude oil prices forecasting method proposed by Shabni et al [6] which is based on 
hybridizing wavelet and artificial neural network (ANN) model. Their study concluded that the hybrid 
method produced a huge forecasting improvement as compared to the ANN model alone. Furthermore, 
Wang et al [7] also conducted a study for crude oil prices forecasting using ARIMA and Back Propagation 
Neural Network Combinatorial Algorithm. Their result showed that highest forecasting accuracy is achieved 
by the combinatorial algorithm rather than to directly utilize both methods individually. 

Even though many hybrid forecasting methods have shown great improvement in crude oil prices 
forecasting, many limitations can still be observed and a method that can predict the COSP as accurate as 
possible is very necessary. A categorization between the existing hybrid forecasting methods for crude oil 
prices can give us an indication of how a better method can be proposed by observing which performs better 
than the other one. Apart from that, the limitations of each categorization must also be analyzed so that future 
researches can comprehend the limitations thus improving new hybrid forecasting methods. Therefore, the 
motivations of this review paper are to identify and categorize the existing methods in crude oil prices hybrid 
forecasting model and to analyse the limitations in each category. Keep in mind that this review only focuses 
on the hybrid models that make use of the “decomposition-and-ensemble” framework because it has gained a 
lot of attention lately and has been implemented in many recent studies. The structure of the paper is as 
follow. Section 2 includes a brief explanation about the forecasting methods that are incorporated into hybrid 
models of our selected studies. The review for the existing hybrid forecasting models which includes the 
categorization details and the limitations identified for each category is presented in Section 3. 
Lastly, Section 4 concludes the review with some recommended future investigation. 


2. HYBRID FORECASTING METHODS 

As stated earlier, the initial step in “decomposition-and-ensemble” framework is getting a lot of 
attention in hybrid forecasting model. The imitial step in this framework is usually called the data 
decomposition. Wavelet transform is one of the popular data decomposition methods that are incorporated in 
a hybrid forecasting model. In general, wavelets mean small waves [8]. They have a distinguished amount of 
fluctuations thus they can be exploited to resemble variables in time or space. According to 
Schliiter et al. [9], wavelet transform acts as a data decomposition method that breaks an original data series 
down into a linear combination of distinct frequencies. Therefore, it can localize and pinpoint the diversity of 
different frequencies in a time series [10]. Figure | and Figure 2 which are extracted from Md-Khair et al, 
[11] shows the graph of monthly crude oil prices series from WTI and the decomposed components series 
using wavelet transform respectively. Notice that in Figure 2, the smooth series represents the original prices 
series but more stable in term of volatility because the original fluctuations are captured by the detail series. 
Another most commonly used data decomposition technique is empirical mode decomposition (EMD), 
firstly proposed by Huang et al, [12]. Yu et al, [13] explained that the fundamental concept of EMD is to 
decompose a time series into several oscillatory functions which is called intrinsic mode functions (IMFs). 
In addition, its decomposition is based on the local characteristic time scale of the data series thus it is very 
efficient. Nevertheless, the main flaw in this method is the mode mixing problem which can cause the IMFs 
to be weak in the physical meaning [14], [15]. This leads to another data decomposition method which is 
called ensemble EMD (EEMD), introduced by Wu et al. [16] to overcome the weakness. Unlike wavelet 
transform, EEMD does not requires basis function for decomposition and only expect two parameters which 
are the number of ensemble and the standard deviation of Gaussian white noise [5]. 
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Figure 1. Monthly dataset for WTI COSP 
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Figure 2. Decomposed wavelets for WTI dataset 


After data decomposition step is done, each individual component is forecasted using suitable 
forecasting method. There exist two main types of forecasting methods which are used by researchers which 
are Statistical method and artificial intelligence (AI) method. Statistical methods are broadly used because the 
comprehension and execution is pretty simple with the ability to be analyzed in great detail [17]. 
One example of statistical method is autoregressive integrated moving average (ARIMA). It is formed based 
on several single models which are autoregressive (AR), moving average (MA) and the consolidation of AR 
and MA named ARMA model [18]. ARMA model is utilized in forecasting stationary data series. In cases 
where the data series is non-stationary, differentiation needs to be done to the data series therefore forming 
ARIMA introducing I in the model where differentiation is done. A popular ARIMA modelling introduced 
by Box and Jenkins consists of model identification, parameter estimation and model validation [19]. 
Figure 3, taken from Md-Khair et al. [11] presents a typical flowchart of Box-Jenkins ARIMA model. 
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Figure 3. Flowchart of Box-Jenkins ARIMA model 


Unfortunately, statistical forecasting methods relies on the assumption that the pattern which exists 
in the past will hold true for the future and linear in nature. This means that they can only provide good 
forecasting results for linear and near linear data series and are not suitable for non-stationary of non-linear 
data series such as the crude oil prices [6]. Therefore, AI forecasting methods are introduced. AI methods 
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imitate the human mind in problem solving thus they can be exploited to find approximate answers for real- 
world problems which contain maccuracies and uncertainties. An example of AI method is artificial neural 
network (ANN). ANN. According to Zhang et al. [17], ANN model was introduced in time series forecasting 
from 1964 and has been widely used in that particular field since then. Basically, ANN is a mathematical 
design which has a highly connected structure [6]. The structure consists of input layer where data are 
presented to the network, hidden layer where the processing is done and the output layer where the results are 
produced. Nonetheless, ANN has these flaws where it suffers from local mimima and overfitting. 
Furthermore, it is challenging to determine the network structure because it can only be established using 
trial-and-error approach [20]. 

In addition to ANN, least square support vector machine (LSSVM) is another example of commonly 
used AI forecasting methods. LSSVM is based on support vector machine (SVM) that used in classification 
and nonlinear function estimation. LSSVM is proposed by Suykens et al. [21] to solve the major drawback of 
SVM higher computational burden for the constrained optimization programming [22]. LSSVM only needs 
to solve a set of lmear equations, rather than quadratic programming which is much easier and 
computationally simpler. Furthermore, LSSVM method utilizes equality constraints instead of inequality 
constraints and adopts the least square linear systemas its loss function, making it computationally attractive, 
good in convergence and high in precision. Figure 4, taken from Md-Kharr et al. [11] shows a typical design 
for LSSVM model development. 
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Figure 4. LSSVM model development 
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3. REVIEW ON HYBRID FORECASTING METHODS 

A total of 12 studies that proposed hybrid methods in crude oil prices forecasting have been 
included in this review. In addition, only methods that follow the “decomposition -and-ensemble”’ framework 
was selected because this framework is proven to increase forecasting accuracy significantly. Moreover, 
this framework has gained a lot of attention when it has been implemented in many recent hybrid forecasting 
studies which clearly proved its efficacy. To summarize the selected study, they are tabulated in Table 1 
which shows the associated study column, approach column that consist of data decomposition and 
forecasting method and lastly the grade of crude oil prices used column. 

As can be observed from Table 1, some approaches contain more than one method in the forecasting 
method column. Even though all of them follow the “decomposition-and-ensemble” framework, we 
identified that all the approaches can be clustered into a more specific categorization. From our investigation, 
two categories were introduced which are (1) hybrid model with single forecasting method, (2) hybrid model 
with multiple forecasting method and (3) hybrid model with single or multiple forecasting method(s) and 
optimization method. Each category is explained in more details in the following sub section. 
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Table 1. Description of Selected Studies 


Associated Approach Crude Oil Prices 

Study Data Decomposition Forecasting Method Grade 

[23] EEMD LSSVM — PSO (forecast nonlinear component) WTI (daily, weekly, 
GARCH (forecast time-varying component) monthly) 

[24] Wavelet Transform RBF neural net work Brent (weekly) 
PCA (reduce number of input variables) 

[25] Wavelet Transform PSO (determine optimal parameters of MLR) WTI (daily) 
MLR (model oil prices) 

[6] Wavelet Transform ANN WTI& Brent (daily) 
FNN (model each extracted components) 

13] pup ALNN (aggregate all components) Wee prene(cany) 

[26] Wavelet Transform LSSVM ee 

(monthly) 

[27] Wavelet Transform Simple Averaging Ensemble aa Paewe 
APSO (Optimize parameters in RVM) 

[5] re) RVM (forecast each component) veeealy) 
SBM (restrain the end effect occurs during sifting process of EMD 

[25] pe FNN (model each extracted components) wey) 

[29] Wavelet Transform SVM WTI (daily) 

[30] Wavelet Transform LSSVM ree orent 

(monthly) 

ARIMA (forecast smooth series) 

[11] Wavelet Transform ; ; bee roe 
LSSVM (forecast detail series) (monthly) 


3.1. Hybrid Model with Single Forecasting Method 

In this category, an approach consists of a data decomposition method and only one forecasting 
method. An example is the study conducted by Shabri et al. [6] which utilized wavelet transform as data 
decomposition and ANN as the forecasting method. In this approach, wavelet transform decomposes the 
crude oil prices into sub components which are then become as the inputs to the ANN model to forecast the 
prices series. The result obtained in their study shown that their proposed approach performed better than the 
single method ANN, ARIMA and GARCH. Other studies also proven that this category of hybrid model is 
more effective than implementing the single method itself without data decomposition [26], [29], [30]. 


3.2. Hybrid Model with Multiple Forecasting Methods 

This category does not introduce much different from the first category. The only thing that is 
different is that in this category, more than one forecasting methods are incorporated into a hybrid model. 
According to Wang et al. [31], the advantages of this category over the previous one is that weakness in 
individual forecasting methods can be discarded to improve forecasting accuracy. The study from Md -Khair 
et al. [11] is an example that proposed such model. In this study, they proposed a hybrid method that utilized 
wavelet transform as the data decomposition method while using a combination of ARIMA to forecast the 
smooth series and LSSVM to forecast the detail series. Their experiment proved that the proposed model is 
superior than the single models and hybrid models with one forecasting method in term of prediction 
accuracy. From the results of these studies, we can conclude that hybrid model with multiple forecasting 
methods performs better than hybrid model with single forecasting method. 


3.3. Hybrid Model with Single or Multiple Forecasting Method(s) and Optimization Method 

For this category, an approach incorporates a data decomposition method with one or more 
forecasting method with added optimization method. The advantage of methods in this category over the 
previous one is that optimization method can improve the required parameters in certain forecasting methods 
thus increasing forecasting accuracy. For instance, the approach proposed by Zhang et al. [23] which make 
use of LSSVM with particle swarm optimization (LSSVM-PSO) method to forecast the nonlinear component 
and generalized autoregressive conditional heteroskedasticity (GARCH) to forecast the time-varying 
component. The PSO in their study is the optimization method which is used to obtain the optimal parameters 
for the LSSVM method. The comparison made with EEMD plus GARCH method, EEMD plus PSO-LSSVM 
method and PSO-LSSVM method shown that their proposed method outperforms the others with more 
accurate forecasting. Another study in this category is the one conducted by Li et al. [5]. The proposed 
approach in this study utilized EEMD for data decomposition and relevance vector machine (RVM) for 
forecasting. Adaptive PSO (APSO) act as the optimization method which is used to simultaneously optimize 
the weights and parameters of RVM kernels. The result also showed that the proposed approach out performs 
all the single models and the hybrid models included in their experiment. From the results of these studies, 
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we can conclude that hybrid model with single or multiple forecasting method(s) and optimization method 
performs better than hybrid model with single forecasting method and hybrid model with multiple forecasting 
method. 


3.4. Drawback in Hybrid Forecasting Model 

From the review done, some noticeable drawbacks can be elucidated. The first one is in the hybrid 
model with single forecasting method category. In this category, if there is any weakness in the forecasting 
method, it cannot be eliminated from the model. For example, if ARIMA method is utilized, only dataset that 
is linear and near linear will give good forecasting results [6]. The problem is, in real world forecasting, 
data series will not be entirely linear or non-linear therefore a single forecasting model will not be sufficient 
because no single model can successfully recognize all the patterns contained in a data series. Therefore, this 
category cannot provide optimal forecasting solution for crude oil prices because it contains volatility, 
nonlinearity, and irregularity which will decrease the forecasting accuracy. 

Another drawback is in the hybrid model with single or multiple forecasting method(s) and 
optimization method category. Even though this category provides the best forecasting accuracy between 
these three categories, the optimization process 1s usually time-consuming. This drawback is proven when Li 
et al. [5] stated that a lot of time is required to find the optimal parameters and to compute the combined 
kernel. Furthermore, according to them, it 1s quite difficult to replicate the experiment with the same exact 
result because the optimization method PSO uses many random values in the evolutionary process. 
Another drawback is the complexity of models in this category is quite high. According to Zhang et al. [23], 
even though this model can grab to complex volatility of crude oil prices very well, the calculation process is 
more intricate than with most previous methods in their experiment. 


4. CONCLUSION AND FUTURE WORKS 

The complexity of crude oil prices fluctuations is a major concern. Therefore, a lot of forecasting 
methods have been proposed by researchers to comprehend the movements thus giving advantages for future 
planning. Throughout the period, many forecasting methods have been studied and optimized to further 
improve the forecasting accuracy. In this review, a total of 12 studies that proposed hybrid methods in crude 
oil prices forecasting based on the “decomposition-and-ensemble” framework have been included to be 
analysed. Several categorizations for clustering the selected studies were introduced to give us an indication 
of how a better method can be proposed by observing which performs better than the other one. 
Furthermore, the limitations of each categorization were analyzed so that future researches can comprehend 
the limitations thus improving new hybrid forecasting methods. For future researches, it is recommended that 
more related studies to be included so that a more thorough and solid categorization can be established, 
and more drawbacks can be identified and highlighted. 
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